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While the numerical methods which utilizes partitions of equal-size, including the box-counting method, 
remain the most popular choice for computing the generalized dimension of multifractal sets, two mass- 
oriented methods are investigated by applying them to the one-dimensional generalized Cantor set. We show 
that both mass-oriented methods generate relatively good results for generalized dimensions for important 
cases where the box-counting method is known to fail. Both the strengths and limitations of the methods are 
also discussed. 


Fractal sets are characterized by self-similarity, 
and power laws can be associated with them. Ex¬ 
amples of fractals in nature are ubiquitous. Their 
discovery led to the extension of the notion of di¬ 
mension. For monofractals, the scaling pattern is 
homogeneous while it varies over the set for mul¬ 
tifractals. By introducing the generalized dimen¬ 
sion D q , not only a non-integer dimension can be 
assigned to a set, but also a spectrum of dimen¬ 
sions can be attributed to a single set if the set is 
a multifractal. In finding the generalized dimen¬ 
sions, the box-counting method has been by far 
the most popular choice among researchers across 
various fields. However, it is known that the class 
of methods which deal with partitions of equal 
size, including the box-counting method, is ill- 
suited for computing the generalized dimensions 
on some domain of q. In this paper, two promising 
methods which utilize mass-oriented partitions, 
rather than partitions of equal-size, are investi¬ 
gated. 


I. INTRODUCTION 

Fractals are the mathematical sets characterized by 
self-similarity. While the history of the study of frac¬ 
tal goes back as far as the 17 th century, 1 the concept 
was popularized by Mandelbrot in 1970s 2 and is now ap¬ 
plied to many fields from cosmology, 3 and chemistry, 4 
to economics. 5 The fact that fractals can be found vir¬ 
tually everywhere suggests that there is an underlying 
mathematical principle. From a geometrical perspective, 
a given set is self-similar when it is similar part of itself. 
A moment of thought convinces us that, to achieve this 
condition, a self-similar set needs to possess an infinite 
nesting structures. Due to this self-similarity, fractals 
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may be preserved after appropriate magnification and 
translation. Therefore, power laws arise naturally in the 
study of fractals as the power law is the only differen¬ 
tiable function that does not change its form under a 
scale transformation. To be precise, if for some differ¬ 
entiable function / which satisfies f(bx) = g{b)f{x) for 
all b > 0 for some function g , the function / must be a 
power law. Since x —> bx is a scale transformation, the 
function / is said to be preserved up to a constant under 
a scale transformation. Accordingly, various power laws 
can be derived from fractal sets and it is the exponents 
of these power laws that the dimensions of the fractal set 
are associated with. 

Traditionally, the dimension of a given set indicates 
the number of independent variables required to specify 
the element within the set and so can take only integer 
values. However, if we want to associate “size” with frac¬ 
tals such as the famous Koch snowflake, 6 we need to ex¬ 
tend our notion of dimension as well as of measure. The 
Koch snowflake is nowhere differentiable and consists of a 
perimeter with infinite length enclosing a finite area. In¬ 
tuitively, the dimension of the set should be bigger than 
a finite interval and smaller than a finite area. Indeed, we 
can define the fractal dimension in a way that the Koch 
snowflake has the dimension of log3/log4 = 1.261.... In 
this example, the fractal dimension is smaller than the 
topological dimension in which it is embedded. Note that 
the fractal dimension can be non-integer. Here, only the 
single dimension is associated with the set and so the 
Koch snowflake is said to be monofractal. Monofractals 
are a type of fractal for which the associated power laws 
are homogeneous within the whole set. If more than 
one scaling law, and therefore the corresponding expo¬ 
nents, are required, the set is said to be multifractal. 
Accordingly, a single dimension cannot fully capture the 
dimensionality of multifractal sets. To resolve this issue, 
the generalized dimension D q was introduced by Renyi.' 
The index q can take any real number and therefore, a 
spectrum of dimension can now be attributed to a given 
set. For a monofractal, the generalized dimension D q is 
constant for any q. In this formulation, more familiar 
fractal dimensions such as the box-counting (Do), the 
information dimension (Di) 8 and the correlation dimen¬ 
sion (D 2 ) 9 are said to be special cases of the generalized 
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dimensions. However, since few fractals can be character¬ 
ized analytically, the search for effective numerical meth¬ 
ods is inevitable. Thus far, the box-counting method has 
been the most popular among researchers despite its dif¬ 
ficulty to accurately compute the generalized dimension 
on some domains of q. The difficulty is rooted in the fact 
that numerical methods are required to deal with a finite 
representation of true fractal sets. Therefore, the sam¬ 
pling process from a theoretical set needs to be carefully 
handled. 

In this work, two promising numerical methods for 
obtaining generalized fractal dimensions are examined. 
One of the methods utilizes the probability distribution 
of the nearest neighbor distances among randomly chosen 
points within a given set. 10 The other method involves 
the collection of distances of the fc th nearest neighbor as 
k increases. 11 They can be applied, in principle, to any 
set as long as a sufficient number of sample points can 
be taken from the set. Unlike the box-counting method, 
which employs a partition composed of equal-sized cells, 
the two methods examined in this paper employ mass- 
oriented partitions. The nearest neighbor method uti¬ 
lizes partitions composed of equal-mass cells while the k- 
neighbor method uses partitions composed of cells with 
cumulative mass. These alternative approaches enable 
one to compute the generalized dimension on the domain 
where the box-counting method encountered difficulty. 
Another advantage of these methods is their ability to 
generate a spectrum of generalized dimensions almost si¬ 
multaneously, and therefore they are particularly suited 
for the analysis of multifractals. 

This work was originally motivated by the emer¬ 
gence of fractal patterns on the one-dimensional universe 
model. 12,13 Thus, our focus is on one-dimensional sets 
although the numerical methods used in this paper can 
be applied to higher dimensional spaces. The analysis 
of fractal dimension should give us some insight into the 
fractal structures which arise in many chaotic systems. 
In particular, we applied the methods to the generalized 
Cantor set. The generalized dimensions of the general¬ 
ized Cantor set can be readily derived analytically, thus 
enabling the accuracy of the numerical methods to be 
verified. We sampled points from the finite representa¬ 
tion of the generalized Cantor set according to the weight 
assigned to each interval. In general, numerical methods 
need to deal with finite samples which often gives rise 
to technical difficulties. No finite sample is a true frac¬ 
tal set, and therefore, the statistical data extracted from 
a finite sample may not accurately reflect the property 
of the original set one wishes to study. It is worth not¬ 
ing that simply increasing the number of sample points 
from an available data set can partially overcome the 
difficulties associated with numerical methods. While a 
true mathematical fractal is characterized be an infinite 
nesting structure, “fractals” found in nature have a lim¬ 
ited liierarchal structures and the range where a power 
law is observed is finite. Accordingly, when employing a 
numerical method, one is required to determine the ap¬ 


plicability of the method in relation to a finite sampling 
process. The generalized Cantor set is an ideal set in 
that the degree of hierarchy can be readily controlled. It 
turns out that the nearest neighbor method suffers from 
the presence of singularities on a certain domain of q in 
the generalized dimensions, and therefore the range on 
which the method provides a reliable result is restricted. 
Nevertheless, for the computation of the box-counting di¬ 
mension (q = 0) as well as D q for q near 0, both methods 
managed to generate results which agree well with the 
theoretical values within a reasonable amount of compu¬ 
tational time. 

The paper is organized as follows: In section II, the im¬ 
portant definitions and notations are stated. In section 
III, we explain the nearest neighbor method and the k- 
neighbor method in depth. In section IV, we discuss some 
of the issues particular to numerical simulations. Section 
V includes an overview of our results and various raw 
data obtained using the aforementioned methods. Math¬ 
ematical methods are employed to analyze the results in 
section VI. In section VII, a summary and conclusions 
are provided. 


II. DEFINITIONS 

A. Generalized Cantor Set 

The Cantor set is one of the most iconic fractals and 
readily generalized to a multifractal set. Accordingly, 
we use the generalized Cantor set as our seminal test 
set to which the numerical methods are applied. It is 
constructed in the following way: It starts with a interval 
of unit length. Then take out the middle part of the 
interval in such a way that the remaining interval on the 
left has a length of lo and on the right l\. Moreover, 
a weight is assigned to each interval, namely po or pi, 
such that Po + Pi = 1 The same procedure is applied to 
each of the two remaining intervals which then results 
in four intervals with lengths, starting from the left, Iq, 
l 0 h, Mo, l\ and weights pi, p 0 pi, P 1 P 2 , p\■ In general, 
after m such iterations, 2 m intervals with various factors 
remain. A generalized Cantor set is what remains after 
taking m —> oo. Particularly, a standard uniform Cantor 
set is obtained for lo = h = \iPo = Pi = \- Another 
special case, referred to as the multiplicative binomial 
process, or MBP, is defined by lo = h = \ with arbitrary 
weights. 14 

Note that, unless in = oo, the set is not a true Cantor 
set. For finite m, the set will be referred to as the finite 
representation of the Cantor set with hierarchy degree 
m. Now, on the m th degree, the weight assigned to each 
interval is given by p j™'* = p™~ k p k . The index k runs 
from 0 to m and depends on the location of the associ¬ 
ated interval. Similary, we can denote the length of each 
segment on the m th level by l^ = l™~ k l\. Then there 
exists an at € R such that p[ m ' > = (l^) ak - In general, 
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afc depends on k unless po = Pi and Iq = 1 1 . Such a*, is 
called the local dimension or singularity. Therefore, the 
uniform Cantor set has single value for ak and is said 
to be monofractal. Otherwise, generalized Cantor sets 
are multifractal, meaning that the local dimension varies 
from place to place within a set. If N ak denotes the num¬ 
ber of segments with the local dimension a*., we define 
f(ak) such that it satisfies the following relation: 

Na k = C lT)~ f{ak) (1) 

in f(a ) is called the spectrum of scaling indices 15 and is 
related to the Renyi Dimension that is discussed in the 
next section. 


B. Renyi Dimension 

As mentioned in the introduction, the traditional no¬ 
tion of dimension can be extended to generate a spectrum 
of dimensions for a given set. While a single characteris¬ 
tic dimension is associated with monofractals, a spectrum 
of dimensions is required to reflect the properties of mul- 
tifractals. Suppose C = {U,} is a cover of a set 4cK". 
Let ri-i denotes the number of points in [/,; among n ran¬ 
domly chosen points from A. Then pi is associated with 
Ui for each i by pi = linin^oo —. For any real number 
q 1, the generalized dimension D q for a set A is given 
by 16 


D q = - --lirn 

H 1 - q e->0 


InEggpf 

lne 


( 2 ) 


where N(e) is the number of sets with diameter d(Ui) = e 
required to cover the set A. For q = 1, the limiting 
case where q —» 1 is used. The topological dimension 
can be recovered when applied to traditional geometries 
and in particular, 1 for a line interval. The generalized 
dimension is also known as the Renyi Dimension, named 
after a Hungarian mathematician, Alfrd Renyi as it can 
be formulated using the Renyi entropy K q , 


K q = 


InEggpf 

1 ~q 


( 3 ) 


Eq. (3) can be regarded as the generalized form of Shan¬ 
non’s entropy. In fact, in the limit of q —> 1, the Renyi 
entropy K q is reduced to the familiar equation: 


N 

K\ = ~y^pi\npi. (4) 

i—1 


Using the Renyi entropy K q , the generalized dimension 
can be formulated as: 


D q 


= — lim 

e-X) 


lne 


( 5 ) 


Note that when q = 0, the Renyi dimensions coincides 
with the box-counting dimension Dq. 


Dq = — lim 
e->0 


lnIV(e) 
In e 


( 6 ) 


In other words, for sufficiently small e, the following re¬ 
lation is satisfied: 


N(e) ~ e~ D ° (7) 

The equation above is an example of the power law re¬ 
lations that can be derived from a given set. Note that 
the box-counting dimension has the opposite sign of the 
exponent. 

In the case of the ?n th finite representation of the gener¬ 
alized Cantor set, the natural cover would be the broken 
intervals themselves and so the weight of each interval 
p'i'." ,> may be used for p in Eq. (2). Then it can be read¬ 
ily shown that for the uniform Cantor set with Iq = l\ 
and pq = pi, we have 


D q 


In 2 
In 3 


( 8 ) 


for all q. Therefore, the Renyi dimensions of the uniform 
Cantor set are g-independent and hence a monofractal. 
On the other hand, applied to the MBP, it can be shown 
that 16 


, = 1 In (pl+pl) 

q q — 1 In 2 


(9) 


where p\ ( p 2 ) is the weight of the left (right) interval and 
l = h = h = \ the length of the segments at the first 
iteration. Thus, the MBP is a multifractal set. There 
is no explicit formula for D q when 1 1 ^ l, 2 , but the di¬ 
mension D q can be found from an implicit relationship 
that employs the spectrum of scaling indices f(a) and the 
Legendre transform. 15 For a general set, it is often diffi¬ 
cult, if not impossible, to find appropriate covers. Thus 
methods which permit numerical simulations should be 
sought. 


III. NUMERICAL METHODS 

In this section, three numerical methods for computing 
the Renyi Dimensions are discussed. 


A. Box-Counting Method 

This method is probably the most well-known and is 
closely related to the original definition of the Renyi Di¬ 
mensions. There are a few slightly different versions 
under the name of the box-counting methods, using 
“spheres” instead of “boxes” for example, 1 ' but the un¬ 
derlying ideas are similar: generally, the number of cells 
required to cover the points in a given set, n, changes 
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as the size of the partitions e changes. The scaling rela¬ 
tion can be extracted for a fractal set as the size of the 
partitions decreases, namely, 


D = — lim 
£->0 


lnn(e) 
In e 


( 10 ) 


Due to the simplicity of the method, it is widely used 
among researchers. However, it has been pointed out 
by many that this method and, more generally, meth¬ 
ods that involve partitions of the same size such as the 
Correlation method, do not work well for q < l . 18 A 
heuristic explanation is given below to understand this 
result. In Eq. (2), we can see that if q > 1, the con¬ 
tribution from relatively large pi is emphasized, and if 
q < 1 , the contribution form relatively small pi plays the 
dominant role. The larger the value of |g|, the greater 
the effective discrimination. Therefore, the fact that the 
method does not produce a good result for D q with q < 1 
means that the sparse regions of the set are not well- 
represented in the finite representation of the Cantor set. 
Since a true fractal possesses an infinite number of points 
or elements, any finite set may not be large enough to 
represent the true Cantor set in relatively sparse regions. 
In some instances, a finite representation of a fractal may 
be thought of as a subset of a corresponding fractal as in 
the Henon map . 19 As the size of the cells diminishes, the 
truncated finite sample no longer statistically represents 
the sparse regions of a true fractal. Under the same con¬ 
dition, the dense regions are affected less by the finite size 
effect. Since numerical methods always have to deal with 
a finite sample, different methods need to be considered 
to find an accurate result for q < 1 . 


B. Nearest Neighbor Method 


The scaling of y ln(n) vs. -log(M Y (n)) for the uniform Cantor set 



-logfM^n)) 


FIG. 1. For the uniform Cantor Set, yln(n) vs. — ln(A/ 7 (n)) 
is plotted for each 7 as n is increased. According to Eq. 13, 
the slope converges to D( 7 ). The corresponding result for 
D{ 7 ) is shown in Fig. 2 


D(y) for the uniform Cantor set with the nearest-neighbor method 



Y 

FIG. 2. In this graph, the Dimension Function D( 7 ) for the 
Uniform Cantor set was computed as the slope of the best-fit 
line to the corresponding data set which is partially plotted 
in Fig. 1. D( 7 ) diverges from the analytical result which is 
log 2/ log 3 for negative 7 . 


The approach called the “nearest neighbor method” 
was first introduced by Badii and Politi . 10 This method 
is essentially based on their observation that 

< 5 > ~ ( 11 ) 


where < 5 > denotes the mean distance from each point 
to its nearest neighbor among n randomly chosen points 
from a given test set and, as discussed earlier, The value 
D 0 is just the box-counting dimension. By naturally ex¬ 
tending the premise, the Dimension Function D{ 7 ) can 
be computed by using the moments of order 7 of the dis¬ 
tribution function P(<5, n) generated by an ensemble of n 
randomly chosen points: 

/»00 

<5 7 >= M 7 (n) = / S' y P(6,n)d5 = ( 12 ) 

Jo 


where K is some function of n and 7 which asymptot¬ 
ically remains bounded as n becomes large. Here, the 
meaning of 7 should be clear; the dense region of a given 
set generates smaller values of 5, the distance to the near¬ 
est neighbor, and vice versa. The proof of a more general 
relation is provided by van de Walter and Schram . 11 It 
follows that the Dimension Function D(y) can be ob¬ 
tained by: 


D{l) = — lim 

n—> 00 


7 Inn 
In M 7 (n) 


(13) 


The function K generally depends on n and 7 but K 
should be, by definition, irrelevant in the limiting case as 
in Eq. (13). In numerical analysis, the value of K(n, 7 ) 
does affect the numerical result as n is finite. The Dimen¬ 
sion Function D( 7 ) can be thought of as an alternative 
generalized dimension and is related to the Renyi Dimen¬ 
sion by : 10 


D [7 = (1 - q)D q ] = D q 


(14) 
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As the equation suggests, once £>( 7 ) is obtained, the gen¬ 
eralized dimension D q can be found as the intersection 
of £>( 7 ) and the straight line with slope (1 — q )~ 1 which 
passes through the origin as illustrated in Fig. 3. 


MBP 



V 


FIG. 3. The solid curve is the simulated result of the Di¬ 
mension Function for MBP. Note how D q can be obtained 
by locating the corresponding intersections. For example, the 
box-counting dimension Do can be found at the intersection 
of D( 7 ) and y = 7 . 

For most cases, the generalized dimension D q is 
uniquely determined from D( 7 ). Note that a larger q 
does not correspond to a larger 7 due to the negative 
sign in the equation. Therefore the index 7 plays a simi¬ 
lar role as q in that it discriminates the range of density 
of a given set that most strongly contributes to D( 7 ). 
In simulations, the Dimension Function D( 7 ) is obtained 
using Eq. (13). The formula can, in principle, be applied 
to sets with any topological dimension. In the case of 
a one-dimensional set, sample points are prepared in a 
way that 5 is bounded from above by 1. Therefore, the 
integral in Eq. (12) can be taken from 0 to 1. Unlike 
the box-counting method, this algorithm does not make 
use of partitions of the same size but, rather, of the same 
“mass” for it can be considered that each element of the 
partition contains two points, namely a reference point 
and its nearest neighbor. Badii and Politi used a slightly 
improved version of the method which uses partitions 
containing three or four points to smooth out local sta¬ 
tistical anomalies . 10 Broggi used partitions containing up 
to 300 points for systems of large dimensionality . 20 


method with k = 1. By not limiting to k = 1, the scal¬ 
ing property is obtained through the global structure of 
a given set, and thus the method is less sensitive to local 
statistical anomalies which often arise in a finite sample 
set. A similar global approach was introduced by Tel et 
al . 21 using elements of different size, rather than different 
mass, and some literature misleadingly refers to it as the 
“cumulative mass” method . 22 The k-neighbor method 
records the distance S(k,n ) from a reference point to 
the fc th neighbor point among n — 1 randomly chosen 
points from a given set. van de Water and Schram for¬ 
mulated a technique for evaluating £)( 7 ) from the aver¬ 
age of 5(k , n ) 7 by using the local dimension introduced in 
Section II . 11 The average of 6(k, n ) 7 is defined as follows: 

1 " 

AW(fc,n) = -V(5hfc,n). (15) 

3 =1 


where Sj(k,n) represents the k th neighbor distance from 
jth j-gference point when n points are randomly chosen 
from a test set. Here, all n sample points are used as 
reference points. When n is large, it can be shown that 


(A 7 (/c, n)) 1 ^ 1 = n 


aD( 7 ) 


r(fc + 7/£>(7)) 

m 


1/7 

(16) 


where a is some constant independent of 7 . Note that 
the average of 5J from a single set is used in Eq. (15) 
whereas the derivation of Eq. (16) is based on the en¬ 
semble probability. For large k, a simple approximate 
relation can be obtained: 


A ^\k,n) 



^n" 1 /D( 7 ) A: 1 /D( 7 ) G(fc, 7 ) 


(17) 


where G(fc, 7 ) is a correction function close to unity. Ac¬ 
cording to Eq. (17), the Dimension Function £)( 7 ) can, 
in principle, be obtained from the slope of the best-fit 
straight line in the log-log plot with either a fixed n or 
k. When k = 1, the equation is reduced to the key rela¬ 
tion in Eq. (12) for the nearest neighbor method. With 
the ^-neighbor method, we used a fixed value of n. The 
correction function G(fc, 7 ) generally exhibits a periodic 
pattern as a direct consequence of the self-similarity of 
fractals as seen in Fig. 12. By fixing n instead of k, we 
can extract a global property of a given set, which makes 
the ^-neighbor method less sensitive to local anomalies 
which often arises from a finite sampling process. 


C. fc-Neighbor Method 

Another method called “k-neighbor” is similar to the 
nearest neighbor method in that its partitions are taken 
according to the number of masses inside. However, in¬ 
stead of fixing the number of masses as in the case of the 
nearest neighbor method, the k-neighbor method incor¬ 
porates a partition of cumulative mass. In fact, the near¬ 
est neighbor method is a special case of the fc-neighbor 


IV. NUMERICAL IMPLEMENTATION 

When dealing with a fractal set numerically, one needs 
to confine oneself to a finite sample. For a Cantor-like 
set, the number of iterations m needs to be finite. The 
hierarchy degree m should be chosen in a way that two 
points in neighboring intervals of the set are distinguish¬ 
able within the precision of a given numerical environ¬ 
ment. In our experiment, m = 30 is typically used and 
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therefore we assume a finite representation of the Can¬ 
tor set which consists of 2 30 intervals. Generally, the 
larger the number of reference points n is used, the more 
accurate the result would be obtained, but n can only 
be increased by correspondingly increasing the amount 
of computation time but, as we discuss below, there is 
another limitation on n as well. 

When the number of sample points exceeds the number 
of broken intervals, the expected probability distribution 
does not produce a desirable result since the distribution 
within an interval is nothing but that of a line interval. 
Therefore, the scaling property needs to be obtained for 
n sufficiently smaller than 2 m but large enough to accu¬ 
rately reflect a given fractal set. Each of the n points is 
randomly assigned to a particular one among 2 m inter¬ 
vals. The position of the point is then randomly chosen 
within the window of the chosen interval. Therefore, in 
our model, most of sample points are taken from the 
points which are not in the real Cantor Set. However, in 
principle, we can always set the upper limit to the dis¬ 
tance between the sample points and closest points in a 
true set by taking m sufficiently large. Choosing a par¬ 
ticular interval randomly among 2 30 intervals amounts to 
randomly generating 30 binary digits. This can be seen 
by assigning 0 to the left interval and 1 to the right inter¬ 
val on each level of the Cantor set. For the uniform Can¬ 
tor set, the probability of generating 0 and 1 is exactly 
half. For the generalized Cantor set, the corresponding 
weight factors are introduced. 

The Mersenne Twister Pseudo-Random Number 
Generator 23 for C++ was our primary choice for obtain¬ 
ing random numbers. The built-in C++ random num¬ 
ber generator was also used. No idiosyncratic behavior 
from the particular choice of random number generator 
was observed. Due to the limitation of the size of n, an 
ensemble average must be employed in order to achieve 
higher accuracy rather than increasing n. The number of 
members of the ensemble required to stabilize the result 
depends on the range of 7 . See section VIA for details. 


V. RESULTS 


Generally, with a small amount of computational time, 
both of the methods in the fixed-mass class give good in¬ 
dications of the Renyi Dimension in the vicinity of the 
box-counting dimension (q = 0 ) on various generalized 
Cantor Sets. This is a major advantage over the box¬ 
counting method if one seeks to find the box-counting di¬ 
mension. Around the box-counting dimension, the near¬ 
est neighbor method yields a result closest to the an¬ 
alytical solutions. However, as 7 goes away from it, 
the k-neighbor method produces more accurate results. 
Therefore, at this point, no single method seems reli¬ 
able enough for an extended domain q of the generalized 
dimension. However, the combination of the aforemen¬ 
tioned methods reveals the essential features of a given 
set such as whether it is a monofractal or multifractal. 


For a multifractal set, how the dimension changes over 
the domain q is a key property. The k-neighbor seems 
to be the best method to start with as it can provide 
the estimate of the generalized dimension over an ex¬ 
tended region, albeit not too accurately. To obtain the 
dimension to a higher accuracy for a particular q or 7 , 
the box-counting or the nearest neighbor method may 
be used. For q > 1, the box-counting method should be 
employed and for q < 1 , the nearest neighbor, provided 
that q is not a very large negative number. Therefore, if 
possible, the results obtained from these methods should 
be compared and examined to see if they are consistent 
within the uncertainty of each method. 


A. Nearest Neighbor Method 


Convergence of D(y) 
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FIG. 4. This figure shows how increasing n = 2 k affects the 
value of —yin n/ In M 7 . The plot was generated for the uni¬ 
form Cantor set. The analytical value for D( 7 ) for all 7 is 
log 2/ log 3 = 0.630... which corresponds to the horizontal line 
in the plot. 

In the nearest neighbor method, the Dimension Func¬ 
tion D{ 7 ) was extracted from Eq. 13. In Eq. 13, 
the right hand side reads — ln 3 ^ 1 before taking the 
limit. To investigate how it approaches to the limit, 
Inn/In Mi versus Inn for the uniform Cantor set was 
plotted in Fig. 4. The points in the plot indicates how 
—7 In n/ In Mi seemingly approaches the theoretical limit 
of In 2/In 3 = 0.63... as ln(n) increases in the case of uni¬ 
form Cantor set. However, it can be seen that the conver¬ 
gence rate is rather slow. Given that m is large enough, 
increasing n can almost always guarantee a higher ac¬ 
curacy around the box-counting dimension. However, 
since the convergence rate is rather slow, determining 
the limit is not a trivial task. For 7 = 1 , the number of 
sample points n = 2° = 512 was required to obtain the 
result within 5% accuracy and n = 2 17 to obtain the re¬ 
sult within 3%. For quick simulations, we typically used 
n = 2 16 and 10 ensembles. In general, we employed the 
linear regression technique and obtained the limit from 
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Typical Results of D(y) for various sets 


Uniform Cantor Set 


MBP 



Y Y 


Analytical - 

Nearest Neighbor 

k-neighbor . 


FIG. 5. These plots show the typical results of D( 7 ) for the 
nearest neighbor method and the k-neighbor method applied 
to four different sets and the corresponding analytical values. 
“Unit Segment” here means a interval of unit length and can 
be thought as the 0 th finite representation of the Cantor set. 
For negative 7 , numerical results persistently deviate from 
the analytical results considerably for the nearest neighbor 
method. While the k-neighbor method works relatively well 
for all 7 , the outcome may not be as accurate as the nearest 
neighbor method for small positive 7 . 


the slope of the appropriate log-log plot. While the over¬ 
all qualitative features of the Dimension Function such as 
the non-decreasing property are properly reflected on the 
domain where 7 is positive, the deviations and the fluc¬ 
tuations around 7 = — 1 seem sudden and uncontrolled. 
The difficulty of obtaining a sensible result for 7 < — 1 
seems persistent throughout the set we have tested. In 
Fig. 5, the results for various generalized Cantor Sets 
are shown; the domain of 7 on which the simulated D(y) 
agrees well with the analytical results is between 0 and 2 . 
For a multifractal, as 7 increases, the numerical results 
start to diverge from the analytical result as well. 


B. fc-Neighbor Method 

Unlike the nearest neighbor method, where the choice 
of n is often limited by an available finite sample and 
computational time, the fc-neighbor method can utilize a 
larger data set from which the slope is extracted to esti¬ 
mate D{ 7 ). As we can see, the fine structure is clearly 
observed in a log-log plot which injects arbitrariness in 
a slope-fitting process. This point is covered in detail in 
section VI. For a fixed value of n, D("f) or, to be precise, 
the corresponding \/D(y) in Eq. (17), is taken as the 
slope of log <5 7 (fc, n ) versus log k/n. As shown in Fig. 12 
, the obtained 5 1 {k,n) exhibits a periodic pattern, so all 
approaches to obtain the slope seem to inject ambiguity. 
We have used the standard linear regression technique 24 
using sample points equally spaced in the logarithmic 
scale of k rather than in the k scale. Another considera¬ 
tion is that the slope, and therefore, the result for D( 7 ) 


depends on the range to which the linear regression is 
applied. It turns out that the best range seems to differ 
for different 7 as shown in Fig. 6 . The plot shows how 
D( 7 ) varies when increasing the upper bound of the slope 
range when applied to the uniform Cantor set with the 
analytical dimension of log 2/log 3 = 0.63... for all 7 . 


k-range dependence of D(y) for the uniform Cantor Set 



Maximum k 

FIG. 6 . This plot shows how D( 7 ) differs when a different 
range is used to extract the slope in the k-neighbor method. 
For the uniform Cantor set, increasing the upper bound of k 
generally seems to produce better results. However, this is 
not a general result. 


As a result of these findings, we have used two dif¬ 
ferent boundaries for computing the slope, one for posi¬ 
tive 7 and the other for negative 7 , to produce the final 
results. Since the inaccuracy inherited from these am¬ 
biguities cannot be entirely removed by increasing n as 
in the nearest neighbor method, it is more difficult for 
the fc-neighbor method to be adjusted to obtain a better 
result before knowing the theoretical values. Neverthe¬ 
less, aside from these ambiguities in the method, the fc- 
neighbor works for both positive and negative ranges of 
q , and therefore, is a good candidate as an initial method 
to investigate a given set. In the simulation, the ordering 
of the n points from the reference points according to 
their relative position takes most of the computational 
time. Since the ordering takes more time as the topolog¬ 
ical dimension increases, the method is said to be espe¬ 
cially suited for one-dimensional sets. Furthermore, un¬ 
like the nearest neighbor method, the hierarchy degree 
m can be substantially small. The scaling region expect¬ 
edly diminishes as m decreases. However, the Dimension 
Function deduced from the best-linear-fit from the ap¬ 
propriate scaling region produces acceptable results. For 
the uniform Cantor Set, when m is as small as 5, we 
obtained -D(q) on the order of 0.6 as shown in Fig. 7. 
This shows that to estimate the fractal dimension from 
the fc-neighbor method, the finite representation does not 
necessarily require a large degree of hierarchy. Hence, the 
k-neighbor method is a good candidate for estimating the 
fractal dimensions when only a limited hierarchy degree 
is available. 












































































































































m dependence of D(y) in the k-neighbor method 



m 


FIG. 7. These plots shows how the results for D( 7 ) change as 
m varies when the k-neighbor method is applied to the m th 
finite representation of the uniform standard Cantor set. The 
theoretical value for D( 7 ) is log(2)/log(3) for all 7 . For all 
iterations the value of n is fixed at 10000. The k-neighbor 
method provides relatively good results even when m is as 
small as 5. 

VI. ANALYSIS 
A. Range and Stability 

In the nearest neighbor method, the probability dis¬ 
tribution of P(S, n ) plays a key role as seen in Eq. 12. 
Hence, it is worthwhile to investigate the nature of proba¬ 
bility distributions associated with fractal sets. Starting 
with the conjecture for the mathematical form for the 
cumulative distribution function for the uniform Cantor 
Set, 

S(S, n) = 1 — exp[— n(2S) D °] (18) 

Badii and Politi argue that the correct form of the prob¬ 
ability density distribution of uniform Cantor set for 
n» 1 is given by 10 

P(S,n) = 2D 0 n(25) D °~ 1 exp[— n[28) D °] (19) 

Note that there is a singularity in the gamma function 
Eq. (20) for nonpositive integer z, 25 

/»00 

r(z) = / t z ~ 1 e~ t dt (20) 

Jo 

By substituting Eq. (19) into (12), a simple computation 
yields that 

/ 1 \ 7 /A> ,00 

M 7 (n) = ( — j / x D oe X dx (21) 
/ 1 y/D 0 

where x = n(25) Da . Therefore, the function, M 7 (n), 
involves singularities for 7 < —Dq. This means that, 


for the generalized Cantor set, the nearest neighbor is 
ill-suited for obtaining Correlation Dimension (q = 2) 
or larger q. The result of D( 7 ) for four different data 
sets are obtained using the nearest neighbor method as 
shown in Fig. 5. In each plot, the numerical results are 
compared to the corresponding analytical results. The 
influence of the singularity is observed for a variety of 
sets. Note that the k-neighbor method does not suffer 
from this kind of singularity. For the fc-neighbor method, 
the corresponding singularity can be found in Eq. (16). 
However, this time, the singularity can be avoided by 
taking a sufficiently large k. Accordingly, the fc-neighbor 
method could generate sensible results in the entire range 
of 7 we have investigated. 

It is worth noting that the simulated probability dis¬ 
tribution functions did not completely converge to the 
theoretical distribution of Eq. (19). The Komologov- 
Smirnov goodness-of-fit test measures the maximum dis¬ 
crepancy between two sample cumulative distributions 
and was employed to compare the theoretical distribution 
given by Eq. (18) with Dq = p-| and the distribution ob¬ 
tained in simulations. As seen in Fig. 8 , the simulated 
distribution for the uniform Cantor set approaches the 
theoretical distribution when Dq = as m increases. 
One would rationally expect the convergence to improve 
but this was not observed. When the number of inter¬ 
vals 2 m exceeds the number of points N = 2 fc , the nearest 
point for each reference point is likely to fall in the same 
interval, and therefore, the result of the K-S goodness-of- 
fit test constantly decreases when m < k. However, the 
maximum discrepancy reaches a plateau when m = fc, 
suggesting that there is a constant disparity between the 
two distributions which does not diminish even when the 
finite representation of the Cantor set has large m hierar¬ 
chy degree. The results of the K-S test is shown in Fig. 8 
when the simulated distribution is compared against the 
theoretical distribution Eq. (19) with different values for 
Dq. Among the values used, the theoretical distribution 
with D = Dq = In 2/ In 3 showed the best fit for m > 14. 

The effective domain is also related to the stability of 
the method. For both methods, as |y| increases, the near¬ 
est distance, <5, is either amplified or attenuated. Conse¬ 
quently, the contribution from only a few sample points 
among n chosen points starts to dominate the integral 
or sum in the equations. Unlike the nearest neighbor 
method, however, the effect of a few sample points is rel¬ 
atively small in the k-neighbor method due to the global 
feature. For the nearest neighbor method, simulations 
require a large number of ensembles and therefore, an 
extensive amount of computational time and memory for 
a relatively large negative I 7 I. How the Dimension Func¬ 
tion £>( 7 ) varies in each implementation in the nearest 
neighbor method is shown in Fig. 9. As 7 increases, the 
values of D( 7 ) fluctuate more when computed under the 
same number of sample points. 

This difficulty can be partially overcome by employ¬ 
ing the “near” neighbor instead of the nearest neighbor 
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K-S Goodness of Fit Test for the Uniform Cantor Set with n=2 15 



Stability Analysis 



Iteration 


FIG. 8 . The Kolmogorov-Smirnov goodness-of-fit test was 
used to compare the simulated probability density distribu¬ 
tion and the theoretical distribution proposed by Badii and 
Politi for the uniform Cantor set with n = 2 15 . According 
to Eq. 18, various values between 0 and 1 were substituted 
for Do for the purpose of this test. Smaller values of the out¬ 
come indicate a better fit. The finite representation of the 
Cantor set with m = 1 is the unit interval. Therefore, ex¬ 
pectedly, the test function with D = 1 exhibits the best fit 
among others. As m increases, the K-S statistic decreases for 
D = Do = In2/ln3 and similar values. However, they reach 
plateaus after m = 15. 

as it makes the simulation less dependent on the local 
property of a single reference point. However, it eventu¬ 
ally suffers from the same difficulty as the magnitude of 
7 increases. The results for D(y) is shown in Fig. 10 
when the near neighbor method is used. The integer i de¬ 
notes the i th neighbor points included in the partitions 
with i = 1 being the nearest neighbor method. More¬ 
over, as i increases, all the relevant equations need to be 
modified accordingly but the dependence on i is not ob¬ 
vious. Overall, the k-neighbor method has an advantage 
for large |'y|. 


B. The Limitation of Numerical Methods 

As shown in Fig. 11 and 12, plots of the probability 
distribution P(8,n ) of S for the nearest neighbor method 
or the fc th neighbor distance d 7 (/c,n) typically exhibit 
self-similar fine structures which arise from the original 
fractal geometry. However, unless a construction recipe is 
known in advance, as in the case of the generalized Can¬ 
tor set, the exact nature of the fine structure is difficult to 
obtain. Moreover, to find its exact nature is essentially 
redundant for it would be another fractal set which is 
as complex as the original fractal set. Hence, numerical 
methods are typically developed based on an assumption 
that these fine structures will not affect their outputs in 
any substantial way. Nevertheless, we should not simply 
ignore the effect of the fine structures as a set would not 
be a fractal without them. In the equations such as Eqs. 


FIG. 9. This figure shows that each iteration of the simu¬ 
lation generates a different outcome forD( 7 ). Sample sets 
were taken from the uniform Cantor set. As 7 increases, the 
results fluctuate more. Larger fluctuation indicates more sen¬ 
sitive dependence on a particular choice of a sample set. For 
negative 7 , the outcome fluctuates even more and the aver¬ 
age of the outcome is significantly smaller than the theoretical 
prediction which is roughly 0.63. 


MBP Generalized Cantor Set 



Y Y 

FIG. 10. These plots show how using near neighbor instead 
the nearest neighbor affects the result. The integer i denotes 
the i th neighbor. While increasing i generally makes D(y) 
smoother, one cannot expect that the results improve when i 
is increased. 


(12) and (17), the fine structures are absorbed by the 
constant or correction term. In general, these correction 
terms depend on the hierarchy degree used in creating a 
test set as well as the number of sample points. How¬ 
ever, it is difficult to estimate the error attributed to the 
correction term, and therefore this raises a question con¬ 
cerning the reliability of the method. 

In principle, the largest possible m should be used to 
reflect the infinite hierarchical self-similarity. For the 
nearest neighbor method, the number of reference points, 
n, needs to be smaller than 2 m . Therefore, to increase 
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PDF with n=2 15 for the Nearest Neighbor Method 



FIG. 11. These plots show how the hierarchy degree m affects 
the probability distribution of the nearest neighbor method. 
The sample sets were taken from the uniform Cantor set. 
While the cumulative distribution is somewhat more stable, 
as m increases, the fine structure of the probability distribu¬ 
tion of 5 emerges, exhibiting self-similar patterns. A limited 
horizontal range from 0 to 3~ 15 is plotted. 


Non-Uniform Cantor Set 



FIG. 12. S J (k,n ) = (A 7 (fc, n))^ 7 ^ is plotted versus k in 
a log-log plot. The fine structure inherited from the non- 
uniform Cantor set is observed. 


n to obtain more accurate results, one needs to increase 
to as well. However, unlike in the case of sample points 
where increasing n generally guarantees a more accurate 
result, increasing to does not necessarily. As you can see 
in Fig. 7, once m reaches a certain threshold, increasing 
to. will not produce a better result. 


method, which employs partitions of distributed mass, 
are good candidates for estimating the generalized frac¬ 
tal dimension for negative q. The /c-neighbor method 
works for the complete range of q and no serious devi¬ 
ations were found. By choosing an appropriate scaling 
region, it is possible to estimate the generalized dimen¬ 
sions even with a small hierarchy degree. However, the 
method involves linear regression and the results depend 
on how the best-fit line is obtained. Therefore, the k- 
neighbor method is a good option for a starting point 
and to investigate the general outlook of D q . If the sam¬ 
ple size is large, the nearest neighbor method can be the 
best method for small negative q. Although the result 
is sensitive to the local anomalies, one can choose the 
size of n according to one’s required precision to extract 
the dimension. However, in contrast with the fc-neighbor 
method, the hierarchy degree, to, also needs to be suf¬ 
ficiently large in order to obtain a desirable probability 
distribution. Therefore, if the sample size of a finite rep¬ 
resentation is small, the nearest neighbor method is not 
a practical choice. For positive q, the methods with par¬ 
titions of equal sizes may be used. In general, a few dif¬ 
ferent methods should be applied before one determines 
if the results from different methods are consistent. The 
fc-neighbor method should provide the overall features of 
D q . Given that the subjective choice of the best-fit line 
affects the result, it is important to determine the window 
of ambiguity. If the sample size is adequate, apply the 
nearest-neighbor method for negative q and box-counting 
or similar method for positive q. The results from these 
two different methods should lie within the window of 
ambiguity. 

In any simulation of the kind worked out in this pa¬ 
per, the finite sample correction needs to be taken care 
of. Although a number of correction terms have been 
proposed over the years, 11,26 many of them add extra 
complications to the simulation without achieving a dra¬ 
matic increase in their method’s accuracy. 11,20,27 In the 
process of exploring the form of the nearest neighbor dis¬ 
tribution of the generalized Cantor set, some interesting 
properties have been obtained; the order of taking m and 
n to infinity may not commute as usually assumed. Since 
a numerical sample only possesses a finite hierarchy, a 
new algorithm which does not assume an infinite hierar¬ 
chy may be useful. In future work it will be shown that 
a new analysis of generalized dimension may be based on 
some quantities that are independent of the hierarchy. 
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